In this Milestone, you'll role play as an intern at NASA, where you'll work to analyze weather data collected from a planetary rover. The dataset, planet_weather.csv contains atmospheric measurements from a planet in our solar system, but the planet's identity has not been disclosed. Your objective is to apply data inspection, cleaning, and analysis techniques to draw conclusions about the planet based on the provided weather data.
The dataset, located in the datasets/ folder is called planet_weather.csv, and it contains the following information:
terrestrial_date: Date on planet Earth, captured as yyyy-mm-dd.
sol: number of elapsed planetary days since beginning measurement.
ls: solar longitude. 0: fall equinox. 90: winter solstice. 180: spring equinox. 270: summer solstice.
month: the month number on the mystery planet.
min_temp: the minimum temperature, in Celsius, during a single day.
pressure: atmospheric pressure, in Pascals
wind_speed: average wind speed, in meters per second.
atmo_opacity: atmospheric opacity.
To start, import both the pandas and plotly.express libraries, and load the data into a DataFrame.
# import pandas and plotly express libraries
import pandas as pd
import plotly.express as px
# load planet_weather.csv data from datasets folder
weather = pd.read_csv('datasets/planet_weather.csv')
Before analyzing anything, it's essential to understand the structure and contents of your dataset. You'll start by previewing the data and checking for missing values or unusual patterns.
# preview the data
weather.head()
| id | terrestrial_date | sol | ls | month | min_temp | max_temp | pressure | wind_speed | atmo_opacity | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1895 | 2018-02-27 | 1977 | 135 | Month 5 | -77.0 | -10.0 | 727.0 | NaN | Sunny |
| 1 | 1893 | 2018-02-26 | 1976 | 135 | Month 5 | -77.0 | -10.0 | 728.0 | NaN | Sunny |
| 2 | 1894 | 2018-02-25 | 1975 | 134 | Month 5 | -76.0 | -16.0 | 729.0 | NaN | Sunny |
| 3 | 1892 | 2018-02-24 | 1974 | 134 | Month 5 | -77.0 | -13.0 | 729.0 | NaN | Sunny |
| 4 | 1889 | 2018-02-23 | 1973 | 133 | Month 5 | -78.0 | -18.0 | 730.0 | NaN | Sunny |
1. How many rows and columns are there in the dataset?
# dataset rows and columns
print(weather.shape)
(1894, 10)
2. What are the names of all the columns?
# dataset columns
weather.columns
Index(['id', 'terrestrial_date', 'sol', 'ls', 'month', 'min_temp', 'max_temp',
'pressure', 'wind_speed', 'atmo_opacity'],
dtype='object')
3. What is the data type of each column?
# data types of each column
weather.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1894 entries, 0 to 1893 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 1894 non-null int64 1 terrestrial_date 1894 non-null object 2 sol 1894 non-null int64 3 ls 1894 non-null int64 4 month 1894 non-null object 5 min_temp 1867 non-null float64 6 max_temp 1867 non-null float64 7 pressure 1867 non-null float64 8 wind_speed 0 non-null float64 9 atmo_opacity 1894 non-null object dtypes: float64(4), int64(3), object(3) memory usage: 148.1+ KB
4. How many null values are there in each column? For each column, sum up the number of null values.
# null values in each column
weather.isnull().sum()
id 0 terrestrial_date 0 sol 0 ls 0 month 0 min_temp 27 max_temp 27 pressure 27 wind_speed 1894 atmo_opacity 0 dtype: int64
5. Provide a statistical summary of the DataFrame.
# Statistical summary of the DataFrame
weather.describe()
| id | sol | ls | min_temp | max_temp | pressure | wind_speed | |
|---|---|---|---|---|---|---|---|
| count | 1894.000000 | 1894.000000 | 1894.000000 | 1867.000000 | 1867.000000 | 1867.000000 | 0.0 |
| mean | 948.372228 | 1007.930306 | 169.180570 | -76.121050 | -12.510445 | 841.066417 | NaN |
| std | 547.088173 | 567.879561 | 105.738532 | 5.504098 | 10.699454 | 54.253226 | NaN |
| min | 1.000000 | 1.000000 | 0.000000 | -90.000000 | -35.000000 | 727.000000 | NaN |
| 25% | 475.250000 | 532.250000 | 78.000000 | -80.000000 | -23.000000 | 800.000000 | NaN |
| 50% | 948.500000 | 1016.500000 | 160.000000 | -76.000000 | -11.000000 | 853.000000 | NaN |
| 75% | 1421.750000 | 1501.750000 | 259.000000 | -72.000000 | -3.000000 | 883.000000 | NaN |
| max | 1895.000000 | 1977.000000 | 359.000000 | -62.000000 | 11.000000 | 925.000000 | NaN |
6. Based on the dataset’s shape, column types, and statistical summary, what initial observations can you make about the data’s structure, completeness, and potential quality issues?
The dataset contains 1,894 rows and 10 columns, with a mix of numeric, categorical, and date-related variables describing planetary weather. Most columns are complete, but min_temp, max_temp, and pressure each have 27 missing values, while wind_speed is completely empty and likely unusable. The terrestrial_date column is stored as an object and should be converted to a datetime format for time-based analysis, and the month column may need cleaning due to its "Month X" format. The temperature and pressure values suggest a very cold, low-pressure environment, possibly indicating Mars. Overall, the dataset is mostly clean, but minor data type corrections and handling of missing values are needed before deeper analysis.
Now that you’ve inspected the data, your next step is to check for any columns that might need to be cleaned or removed.
1. Are there any columns with mostly missing values? Perhaps the wind speed sensor was broken! Remove this column from the dataframe.
# Delete wind_speed column, which is filled with null values
weather.drop('wind_speed', axis = 1, inplace = True)
2A. Now, check for columns that might not add much value to your analysis.
Are there any columns where almost every value is the same?
Take a close look at the atmo_opacity column. How many unique values are there? How frequent are they?
# How many unique values are there in the atmo_opacity column?
print(weather['atmo_opacity'].nunique())
# Show the frequency of each value
print(weather['atmo_opacity'].value_counts())
2 Sunny 1891 -- 3 Name: atmo_opacity, dtype: int64
2B. The atmosphere sensors seem to have been faulty and did not capture accurate data. Drop this column, which contains identical values.
# Drop the atmo_opacity column
weather.drop('atmo_opacity', axis = 1, inplace = True)
Let’s explore trends in the planetary weather using groupings and charts. This will help you uncover seasonal patterns and key environmental characteristics of the mystery planet.
You’ll need to use the .groupby() method here to analyze how temperature and pressure vary across months. Store your grouped results in new DataFrames to make them easier to visualize.
1. How many months are there on this planet?
# Number of unique months
print(weather['month'].nunique())
12
2A. What is the average minimum temperature for each month?
# Average min_temp each month
avg_min_temp_by_month = weather.groupby('month').agg({'min_temp' : 'mean'}).reset_index()
print(avg_min_temp_by_month)
month min_temp 0 Month 1 -77.160920 1 Month 10 -71.982143 2 Month 11 -71.985507 3 Month 12 -74.451807 4 Month 2 -79.932584 5 Month 3 -83.307292 6 Month 4 -82.747423 7 Month 5 -79.308725 8 Month 6 -75.299320 9 Month 7 -72.281690 10 Month 8 -68.382979 11 Month 9 -69.171642
2B. Using your grouped results above to plot a bar chart of the average minimum temperature by month.
# Bar chart of the average min_temp by month
fig = px.bar(avg_min_temp_by_month,
x = 'month',
y = 'min_temp',
title = 'Average Minimum Temperature by Month',
labels = {'month':'Month', 'min_temp':'Avg Min Temp (C)'},
color = 'min_temp', # Adds color scale based on temperature values
color_continuous_scale='RdBu') # Choose a color scale
fig.update_layout(
title={'x': 0.5}, # Centers the title horizontally
coloraxis_colorbar=dict(title='Temp (C)') # Rename color bar
)
fig.show()
2C. Based on the minimum temperature, what is the coldest month? What is the warmest month?
The coldest month based on average minimum temperature is Month 3, while the warmest month is Month 8.
3A. What is the average pressure for each month?
# What is the average pressure for each month?
avg_pressure_by_month = weather.groupby('month').agg({'pressure': 'mean'}).reset_index()
print(avg_pressure_by_month)
month pressure 0 Month 1 862.488506 1 Month 10 887.312500 2 Month 11 857.014493 3 Month 12 842.156627 4 Month 2 889.455056 5 Month 3 877.322917 6 Month 4 806.329897 7 Month 5 748.557047 8 Month 6 745.054422 9 Month 7 795.105634 10 Month 8 873.829787 11 Month 9 913.305970
3B. Using your grouped results above to plot a bar chart of the average atmospheric pressure by month.
# Bar chart of the average atmospheric pressure by month
fig = px.bar(
avg_pressure_by_month,
x='month',
y='pressure',
title='Average Atmospheric Pressure by Month',
labels={'month': 'Month', 'pressure': 'Avg Pressure (Pa)'},
color='pressure',
color_continuous_scale='Viridis'
)
# Center the title
fig.update_layout(title={'x': 0.5})
fig.show()
4. Plot a line chart of the daily atmospheric pressure by terrestrial date.
# Line chart of the daily atmospheric pressure by terrestrial date
fig = px.line(
weather,
x='terrestrial_date',
y='pressure',
title='Daily Atmospheric Pressure Over Time',
labels={'terrestrial_date': 'Date', 'pressure': 'Pressure (Pa)'}
)
# Center the title
fig.update_layout(title={'x': 0.5})
fig.show()
5. Plot a line chart the daily minimum temp.
# Line chart the daily minimum temp
fig = px.line(
weather,
x='terrestrial_date',
y='min_temp',
title='Daily Minimum Temperature Over Time',
labels={'terrestrial_date': 'Date', 'min_temp': 'Min Temperature (C)'}
)
# Center the title
fig.update_layout(title={'x': 0.5})
fig.show()
6. Based on this information, approximately how many earth days are there in a year on this planet?
Based on the visual patterns, there are approximately 687 Earth days in a year on this planet, which closely matches the Martian year.
7. What is the identity of the planet? Go to this wesbsite and see what planet this lines up with!
Based on the data patterns and comparison with NASA’s information, the planet is Mars.
Earlier in the milestone you investigated how many months were in our Mystery Planet. Unfortunately, the answer (12) was not very satisfying. This is because there is no standard calendar for Mars. When the data was collected, they used 12 "months" though each month is longer than a typical Earth month. Let's investigate!
First, filter your dataset so that you are only looking at any terrestrial_date before 2014.
# filter to all values where terrestrial_date is before 2014
# store it in a new variable.
pre_2014_data = weather[weather['terrestrial_date'] < '2014']
# show dataframe
print(pre_2014_data)
id terrestrial_date sol ls month min_temp max_temp pressure 1453 432 2013-12-31 499 69 Month 3 -84.0 -30.0 899.0 1454 424 2013-12-30 498 69 Month 3 -86.0 -28.0 901.0 1455 425 2013-12-29 497 69 Month 3 -86.0 -30.0 901.0 1456 428 2013-12-28 496 68 Month 3 -85.0 -26.0 901.0 1457 431 2013-12-27 495 68 Month 3 -86.0 -26.0 900.0 ... ... ... ... ... ... ... ... ... 1889 24 2012-08-18 12 156 Month 6 -76.0 -18.0 741.0 1890 13 2012-08-17 11 156 Month 6 -76.0 -11.0 740.0 1891 2 2012-08-16 10 155 Month 6 -75.0 -16.0 739.0 1892 232 2012-08-15 9 155 Month 6 NaN NaN NaN 1893 1 2012-08-07 1 150 Month 6 NaN NaN NaN [441 rows x 8 columns]
Lastly, for each month in the dataframe, return both the min value and the max value of the terrestrial_date field.
# For each month, calculate the minimum AND maximum terrestrial_date
date_range_by_month = pre_2014_data.groupby('month').agg({'terrestrial_date': ['min', 'max']})
# Display the result
print(date_range_by_month)
terrestrial_date
min max
month
Month 1 2013-08-01 2013-10-02
Month 10 2013-02-24 2013-04-12
Month 11 2013-04-13 2013-06-04
Month 12 2013-06-05 2013-07-31
Month 2 2013-10-03 2013-12-08
Month 3 2013-12-09 2013-12-31
Month 6 2012-08-07 2012-09-29
Month 7 2012-09-30 2012-11-19
Month 8 2012-11-20 2013-01-07
Month 9 2013-01-08 2013-02-23
How many Earth days, roughly, are there in each "month" in the mystery planet? Does that lineup with what you expected now that you know the identity of the mystery planet?
On average, each “month” on the mystery planet (Mars) is about 50–65 Earth days long, which makes sense given that a full Martian year is about 687 Earth days.